Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 22
Filter
Add more filters










Publication year range
1.
Sensors (Basel) ; 24(3)2024 Jan 30.
Article in English | MEDLINE | ID: mdl-38339617

ABSTRACT

Across five studies, we present the preliminary technical validation of an infant-wearable platform, LittleBeats™, that integrates electrocardiogram (ECG), inertial measurement unit (IMU), and audio sensors. Each sensor modality is validated against data from gold-standard equipment using established algorithms and laboratory tasks. Interbeat interval (IBI) data obtained from the LittleBeats™ ECG sensor indicate acceptable mean absolute percent error rates for both adults (Study 1, N = 16) and infants (Study 2, N = 5) across low- and high-challenge sessions and expected patterns of change in respiratory sinus arrythmia (RSA). For automated activity recognition (upright vs. walk vs. glide vs. squat) using accelerometer data from the LittleBeats™ IMU (Study 3, N = 12 adults), performance was good to excellent, with smartphone (industry standard) data outperforming LittleBeats™ by less than 4 percentage points. Speech emotion recognition (Study 4, N = 8 adults) applied to LittleBeats™ versus smartphone audio data indicated a comparable performance, with no significant difference in error rates. On an automatic speech recognition task (Study 5, N = 12 adults), the best performing algorithm yielded relatively low word error rates, although LittleBeats™ (4.16%) versus smartphone (2.73%) error rates were somewhat higher. Together, these validation studies indicate that LittleBeats™ sensors yield a data quality that is largely comparable to those obtained from gold-standard devices and established protocols used in prior research.


Subject(s)
Posture , Walking , Adult , Humans , Motion , Walking/physiology , Posture/physiology , Standing Position , Algorithms , Biomechanical Phenomena
2.
JMIR Hum Factors ; 11: e49316, 2024 Feb 08.
Article in English | MEDLINE | ID: mdl-38329785

ABSTRACT

BACKGROUND: Wearable devices permit the continuous, unobtrusive collection of data from children in their natural environments and can transform our understanding of child development. Although the use of wearable devices has begun to emerge in research involving children, few studies have considered families' experiences and perspectives of participating in research of this kind. OBJECTIVE: Through a mixed methods approach, we assessed parents' and children's experiences of using a new wearable device in the home environment. The wearable device was designed specifically for use with infants and young children, and it integrates audio, electrocardiogram, and motion sensors. METHODS: In study 1, semistructured phone interviews were conducted with 42 parents of children aged 1 month to 9.5 years who completed 2 day-long recordings using the device, which the children wore on a specially designed shirt. In study 2, a total of 110 parents of children aged 2 months to 5.5 years responded to a questionnaire assessing their experience of completing 3 day-long device recordings in the home. Guided by the Digital Health Checklist, we assessed parental responses from both studies in relation to the following three key domains: (1) access and usability, (2) privacy, and (3) risks and benefits. RESULTS: In study 1, most parents viewed the device as easy to use and safe and remote visits as convenient. Parents' views on privacy related to the audio recordings were more varied. The use of machine learning algorithms (vs human annotators) in the analysis of the audio data, the ability to stop recordings at any time, and the view that the recordings reflected ordinary family life were some reasons cited by parents who expressed minimal, if any, privacy concerns. Varied risks and benefits were also reported, including perceived child comfort or discomfort, the need to adjust routines to accommodate the study, the understanding gained from the study procedures, and the parent's and child's enjoyment of study participation. In study 2, parents' ratings on 5 close-ended items yielded a similar pattern of findings. Compared with a "neutral" rating, parents were significantly more likely to agree that (1) device instructions were helpful and clear (t109=-45.98; P<.001), (2) they felt comfortable putting the device on their child (t109=-22.22; P<.001), and (3) they felt their child was safe while wearing the device (t109=-34.48; P<.001). They were also less likely to worry about the audio recordings gathered by the device (t108=6.14; P<.001), whereas parents' rating of the burden of the study procedures did not differ significantly from a "neutral" rating (t109=-0.16; P=.87). CONCLUSIONS: On the basis of parents' feedback, several concrete changes can be implemented to improve this new wearable platform and, ultimately, parents' and children's experiences of using child wearable devices in the home setting.


Subject(s)
Wearable Electronic Devices , Humans , Child , Infant , Child, Preschool , Digital Health , Emotions , Algorithms , Checklist
3.
PLoS Comput Biol ; 19(1): e1009061, 2023 01.
Article in English | MEDLINE | ID: mdl-36656910

ABSTRACT

The methods of geometric morphometrics are commonly used to quantify morphology in a broad range of biological sciences. The application of these methods to large datasets is constrained by manual landmark placement limiting the number of landmarks and introducing observer bias. To move the field forward, we need to automate morphological phenotyping in ways that capture comprehensive representations of morphological variation with minimal observer bias. Here, we present Morphological Variation Quantifier (morphVQ), a shape analysis pipeline for quantifying, analyzing, and exploring shape variation in the functional domain. morphVQ uses descriptor learning to estimate the functional correspondence between whole triangular meshes in lieu of landmark configurations. With functional maps between pairs of specimens in a dataset we can analyze and explore shape variation. morphVQ uses Consistent ZoomOut refinement to improve these functional maps and produce a new representation of shape variation, area-based and conformal (angular) latent shape space differences (LSSDs). We compare this new representation of shape variation to shape variables obtained via manual digitization and auto3DGM, an existing approach to automated morphological phenotyping. We find that LSSDs compare favorably to modern 3DGM and auto3DGM while being more computationally efficient. By characterizing whole surfaces, our method incorporates more morphological detail in shape analysis. We can classify known biological groupings, such as Genus affiliation with comparable accuracy. The shape spaces produced by our method are similar to those produced by modern 3DGM and to auto3DGM, and distinctiveness functions derived from LSSDs show us how shape variation differs between groups. morphVQ can capture shape in an automated fashion while avoiding the limitations of manually digitized landmarks, and thus represents a novel and computationally efficient addition to the geometric morphometrics toolkit.


Subject(s)
Anatomy , Mathematics , Phenotype , Anatomy/methods
4.
Front Artif Intell ; 5: 806274, 2022.
Article in English | MEDLINE | ID: mdl-35647534

ABSTRACT

A language-independent automatic speech recognizer (ASR) is one that can be used for phonetic transcription in languages other than the languages in which it was trained. Language-independent ASR is difficult to train, because different languages implement phones differently: even when phonemes in two different languages are written using the same symbols in the international phonetic alphabet, they are differentiated by different distributions of language-dependent redundant articulatory features. This article demonstrates that the goal of language-independence may be approximated in different ways, depending on the size of the training set, the presence vs. absence of familial relationships between the training and test languages, and the method used to implement phone recognition or classification. When the training set contains many languages, and when every language in the test set is related (shares the same language family with) a language in the training set, then language-independent ASR may be trained using an empirical risk minimization strategy (e.g., using connectionist temporal classification without extra regularizers). When the training set is limited to a small number of languages from one language family, however, and the test languages are not from the same language family, then the best performance is achieved by using domain-invariant representation learning strategies. Two different representation learning strategies are tested in this article: invariant risk minimization, and regret minimization. We find that invariant risk minimization is better at the task of phone token classification (given known segment boundary times), while regret minimization is better at the task of phone token recognition.

5.
Article in English | MEDLINE | ID: mdl-35291257

ABSTRACT

We design a framework for studying prelinguistic child voice from 3 to 24 months based on state-of-the-art algorithms in diarization. Our system consists of a time-invariant feature extractor, a context-dependent embedding generator, and a classifier. We study the effect of swapping out different components of the system, as well as changing loss function, to find the best performance. We also present a multiple-instance learning technique that allows us to pre-train our parameters on larger datasets with coarser segment boundary labels. We found that our best system achieved 43.8% DER on test dataset, compared to 55.4% DER achieved by LENA software. We also found that using convolutional feature extractor instead of logmel features significantly increases the performance of neural diarization.

6.
Speech Commun ; 133: 41-61, 2021 Oct.
Article in English | MEDLINE | ID: mdl-36062214

ABSTRACT

Classification of infant and parent vocalizations, particularly emotional vocalizations, is critical to understanding how infants learn to regulate emotions in social dyadic processes. This work is an experimental study of classifiers, features, and data augmentation strategies applied to the task of classifying infant and parent vocalization types. Our data were recorded both in the home and in the laboratory. Infant vocalizations were manually labeled as cry, fus (fuss), lau (laugh), bab (babble) or scr (screech), while parent (mostly mother) vocalizations were labeled as ids (infant-directed speech), ads (adult-directed speech), pla (playful), rhy (rhythmic speech or singing), lau (laugh) or whi (whisper). Linear discriminant analysis (LDA) was selected as a baseline classifier, because it gave the highest accuracy in a previously published study covering part of this corpus. LDA was compared to two neural network architectures: a two-layer fully-connected network (FCN), and a convolutional neural network with self-attention (CNSA). Baseline features extracted using the OpenSMILE toolkit were augmented by extra voice quality, phonetic, and prosodic features, each targeting perceptual features of one or more of the vocalization types. Three web data augmentation and transfer learning methods were tested: pre-training of network weights for a related task (adult emotion classification), augmentation of under-represented classes using data uniformly sampled from other corpora, and augmentation of under-represented classes using data selected by a minimum cross-corpus information difference criterion. Feature selection using Fisher scores and experiments of using weighted and unweighted samplers were also tested. Two datasets were evaluated: a benchmark dataset (CRIED) and our own corpus. In terms of unweighted-average recall of CRIED dataset, the CNSA achieved the best UAR compared with previous studies. In terms of classification accuracy, weighted F1, and macro F1 of our own dataset, the neural networks both significantly outperformed LDA; the FCN slightly (but not significantly) outperformed the CNSA. Cross-examining features selected by different feature selection algorithms permits a type of post-hoc feature analysis, in which the most important acoustic features for each binary type discrimination are listed. Examples of each vocalization type of overlapped features were selected, and their spectrograms are presented, and discussed with respect to the type-discriminative acoustic features selected by various algorithms. MFCC, log Mel Frequency Band Energy, LSP frequency, and F1 are found to be the most important spectral envelope features; F0 is found to be the most important prosodic feature.

7.
J Exp Psychol Appl ; 25(1): 41-61, 2019 Mar.
Article in English | MEDLINE | ID: mdl-30688498

ABSTRACT

Patient portals to Electronic Health Record (EHR) systems are underused by older adults because of limited system usability and usefulness, including difficulty understanding numeric information. We investigated whether enhanced context for portal messages about test results improved responses to these messages, comparing verbally, graphically, and video-enhanced formats. Older adults viewed scenarios with fictitious patient profiles and messages describing results for these patients from cholesterol or diabetes screening tests indicating lower, borderline, or higher risk levels. These messages were conveyed by standard format (table of numerical test scores) or one of the enhanced formats. Verbatim and gist memory for test results, risk perception, affective response, attitude toward and intention to perform self-care behaviors, and satisfaction were measured. Verbally and video enhanced context improved older adults' gist but not verbatim memory compared to the standard format, suggesting we were successful in designing messages that highlight gist-based information. Little evidence was found for benefits related to the graphically enhanced format. Although verbally and video enhanced formats improved gist memory and message satisfaction, they had less impact on the other responses to the messages. However, these responses reflected level of risk: As risk associated with test results increased, positive affect decreased whereas negative affect, perceived risk, behavioral attitudes, and intentions increased, as predicted by behavioral change theories. (PsycINFO Database Record (c) 2019 APA, all rights reserved).


Subject(s)
Comprehension , Electronic Health Records , Health Literacy , Patient Portals , Aged , Computer Graphics , Female , Humans , Male , Risk Assessment , Self Care
8.
J Acoust Soc Am ; 143(6): 3207, 2018 06.
Article in English | MEDLINE | ID: mdl-29960420

ABSTRACT

Most mainstream automatic speech recognition (ASR) systems consider all feature frames equally important. However, acoustic landmark theory is based on a contradictory idea that some frames are more important than others. Acoustic landmark theory exploits quantal nonlinearities in the articulatory-acoustic and acoustic-perceptual relations to define landmark times at which the speech spectrum abruptly changes or reaches an extremum; frames overlapping landmarks have been demonstrated to be sufficient for speech perception. In this work, experiments are conducted on the TIMIT corpus, with both Gaussian mixture model (GMM) and deep neural network (DNN)-based ASR systems, and it is found that frames containing landmarks are more informative for ASR than others. It is discovered that altering the level of emphasis on landmarks by re-weighting acoustic likelihood tends to reduce the phone error rate (PER). Furthermore, by leveraging the landmark as a heuristic, one of the hybrid DNN frame dropping strategies maintained a PER within 0.44% of optimal when scoring less than half (45.8% to be precise) of the frames. This hybrid strategy outperforms other non-heuristic-based methods and demonstrate the potential of landmarks for reducing computation.

9.
AMIA Annu Symp Proc ; 2018: 185-194, 2018.
Article in English | MEDLINE | ID: mdl-30815056

ABSTRACT

In an effort to guide the development of a computer agent (CA)-based adviser system that presents patient-centered language to older adults (e.g., medication instructions in portal environments or smartphone apps), we evaluated 360 older and younger adults' responses to medication information delivered by a set of CAs. We assessed patient memory for medication information, their affective responses to the information, their perception of the CA's teaching effectiveness and expressiveness, and their perceived level of similarity with each CA. Each participant saw CAs varying in appearance and levels of realism (Photo-realistic vs Cartoon vs Emoji, as control condition). To investigate the impact of affective cues on patients, we varied CA message framing, with effects described either as gains of taking or losses of not taking the medication. Our results corroborate the idea that CAs can produce a significant effect on older adults' learning in part by engendering social responses.


Subject(s)
Communication , Medication Therapy Management , Software , Translating , Adult , Age Factors , Aged , Audiovisual Aids , Female , Health Literacy , Humans , Male , Memory , Middle Aged , Unified Medical Language System
10.
J Acoust Soc Am ; 142(2): 792, 2017 08.
Article in English | MEDLINE | ID: mdl-28863599

ABSTRACT

The best actors, particularly classic Shakespearian actors, are experts at vocal expression. With prosodic inflection, change of voice quality, and non-textual utterances, they communicate emotion, emphasize ideas, create drama, and form a complementary language which works with the text to tell the story in the script. To begin to study selected elements of vocal expression in acted speech, corpora were curated from male actors' Hamlet and female actresses' Lady Macbeth soliloquy performances. L1 speakers of American English on Mechanical Turk listened to excerpts from the corpora, and provided descriptions of the speaker's vocal expression. In this exploratory, open-ended, mixed-methods study, approximately 60% of all responses described emotion, and the remainder of responses split evenly between voice quality (including effort levels) and prosody. Also, significant differences were found in the kind and quantity of descriptors applied to male and female speech. Perception-grounded male and female acoustic feature sets which tracked the actors' expressive effort levels through the continuum of whispered, breathy, modal, and resonant speech are presented and validated via multiple models. The best results in applying these features to simple, un-optimized, four-way decision tree classifiers yielded 76% accuracy for male and 73% accuracy for female expressive, acted speech.


Subject(s)
Acoustics , Auditory Perception , Emotions , Speech Acoustics , Voice Quality , Female , Humans , Judgment , Loudness Perception , Male , Pitch Perception , Sex Factors , Speech Production Measurement
11.
J Biomed Inform ; 69: 63-74, 2017 05.
Article in English | MEDLINE | ID: mdl-28347856

ABSTRACT

We describe a project intended to improve the use of Electronic Medical Record (EMR) patient portal information by older adults with diverse numeracy and literacy abilities, so that portals can better support patient-centered care. Patient portals are intended to bridge patients and providers by ensuring patients have continuous access to their health information and services. However, they are underutilized, especially by older adults with low health literacy, because they often function more as information repositories than as tools to engage patients. We outline an interdisciplinary approach to designing and evaluating portal-based messages that convey clinical test results so as to support patient-centered care. We first describe a theory-based framework for designing effective messages for patients. This involves analyzing shortcomings of the standard portal message format (presenting numerical test results with little context to guide comprehension) and developing verbally, graphically, video- and computer agent-based formats that enhance context. The framework encompasses theories from cognitive and behavioral science (health literacy, fuzzy trace memory, behavior change) as well as computational/engineering approaches (e.g., image and speech processing models). We then describe an approach to evaluating whether the formats improve comprehension of and responses to the messages about test results, focusing on our methods. The approach combines quantitative (e.g., response accuracy, Likert scale responses) and qualitative (interview) measures, as well as experimental and individual difference methods in order to investigate which formats are more effective, and whether some formats benefit some types of patients more than others. We also report the results of two pilot studies conducted as part of developing the message formats.


Subject(s)
Electronic Health Records , Patient Portals , Self Care , Aged , Health Literacy , Humans , Interdisciplinary Communication , Middle Aged , Patient Care , Patient-Centered Care
12.
J Acoust Soc Am ; 132(6): 3980-9, 2012 Dec.
Article in English | MEDLINE | ID: mdl-23231127

ABSTRACT

Speech can be represented as a constellation of constricting vocal tract actions called gestures, whose temporal patterning with respect to one another is expressed in a gestural score. Current speech datasets do not come with gestural annotation and no formal gestural annotation procedure exists at present. This paper describes an iterative analysis-by-synthesis landmark-based time-warping architecture to perform gestural annotation of natural speech. For a given utterance, the Haskins Laboratories Task Dynamics and Application (TADA) model is employed to generate a corresponding prototype gestural score. The gestural score is temporally optimized through an iterative timing-warping process such that the acoustic distance between the original and TADA-synthesized speech is minimized. This paper demonstrates that the proposed iterative approach is superior to conventional acoustically-referenced dynamic timing-warping procedures and provides reliable gestural annotation for speech datasets.


Subject(s)
Acoustics , Gestures , Glottis/physiology , Mouth/physiology , Speech Acoustics , Voice Quality , Biomechanical Phenomena , Female , Humans , Male , Models, Theoretical , Signal Processing, Computer-Assisted , Sound Spectrography , Speech Production Measurement/methods , Time Factors
13.
Clin Linguist Phon ; 26(9): 806-22, 2012 Sep.
Article in English | MEDLINE | ID: mdl-22876770

ABSTRACT

A multimodal approach combining acoustics, intelligibility ratings, articulography and surface electromyography was used to examine the characteristics of dysarthria due to cerebral palsy (CP). CV syllables were studied by obtaining the slope of F2 transition during the diphthong, tongue-jaw kinematics during the release of the onset consonant, and the related submental muscle activities and relating these measures to speech intelligibility. The results show that larger reductions of F2 slope are correlated with lower intelligibility in CP-related dysarthria. Among the three speakers with CP, the speaker with the lowest F2 slope and intelligibility showed smallest tongue release movement and largest jaw opening movement. The other two speakers with CP were comparable in the amplitude and velocity of tongue movements, but one speaker had abnormally prolonged jaw movement. The tongue-jaw coordination pattern found in the speakers with CP could be either compensatory or subject to an incompletely developed oromotor control system.


Subject(s)
Cerebral Palsy/physiopathology , Dysarthria/physiopathology , Speech Intelligibility/physiology , Speech/physiology , Adult , Biomechanical Phenomena/physiology , Cerebral Palsy/complications , Dysarthria/etiology , Electromyography/methods , Female , Humans , Jaw/physiology , Male , Models, Biological , Phonetics , Speech Acoustics , Speech Production Measurement/methods , Tongue/physiology , Young Adult
14.
IEEE Trans Pattern Anal Mach Intell ; 34(5): 959-71, 2012 May.
Article in English | MEDLINE | ID: mdl-21844626

ABSTRACT

Content-based multimedia indexing, retrieval, and processing as well as multimedia databases demand the structuring of the media content (image, audio, video, text, etc.), one significant goal being to associate the identity of the content to the individual segments of the signals. In this paper, we specifically address the problem of speaker clustering, the task of assigning every speech utterance in an audio stream to its speaker. We offer a complete treatment to the idea of partially supervised speaker clustering, which refers to the use of our prior knowledge of speakers in general to assist the unsupervised speaker clustering process. By means of an independent training data set, we encode the prior knowledge at the various stages of the speaker clustering pipeline via 1) learning a speaker-discriminative acoustic feature transformation, 2) learning a universal speaker prior model, and 3) learning a discriminative speaker subspace, or equivalently, a speaker-discriminative distance metric. We study the directional scattering property of the Gaussian mixture model (GMM) mean supervector representation of utterances in the high-dimensional space, and advocate exploiting this property by using the cosine distance metric instead of the euclidean distance metric for speaker clustering in the GMM mean supervector space. We propose to perform discriminant analysis based on the cosine distance metric, which leads to a novel distance metric learning algorithm­linear spherical discriminant analysis (LSDA). We show that the proposed LSDA formulation can be systematically solved within the elegant graph embedding general dimensionality reduction framework. Our speaker clustering experiments on the GALE database clearly indicate that 1) our speaker clustering methods based on the GMM mean supervector representation and vector-based distance metrics outperform traditional speaker clustering methods based on the "bag of acoustic features" representation and statistical model-based distance metrics, 2) our advocated use of the cosine distance metric yields consistent increases in the speaker clustering performance as compared to the commonly used euclidean distance metric, 3) our partially supervised speaker clustering concept and strategies significantly improve the speaker clustering performance over the baselines, and 4) our proposed LSDA algorithm further leads to state-of-the-art speaker clustering performance.


Subject(s)
Artificial Intelligence , Pattern Recognition, Automated/methods , Speech/classification , Cluster Analysis , Discriminant Analysis , Humans , Signal Processing, Computer-Assisted
15.
Folia Phoniatr Logop ; 63(4): 187-94, 2011.
Article in English | MEDLINE | ID: mdl-20938200

ABSTRACT

BACKGROUND/AIMS: This study examined the spectral characteristics of American English vowels in dysarthria associated with cerebral palsy (CP), and investigated the relationship between a speaker's overall speech intelligibility and vowel contrast. METHODS: The data were collected from 12 American English native speakers (9 speakers with a diagnosis of CP and 3 controls). Primary measures were F(1) and F(2) frequencies of 3 corner vowels /i, a, u/ and 3 noncorner vowels /I, 3, */. Six acoustic variables were derived from the formant measures, and were regressed against intelligibility: corner vowel space, noncorner vowel space, mean distance between vowels, F(1) and F(2) variability, and overlap degree among vowels. RESULTS: First, the effect of vowel was significant for both F(1) and F(2) measures for all speakers, but post hoc analysis revealed a reduced distinction at lower intelligibility. Second, regression functions relating intelligibility and acoustic variables were significant for overlap degree among vowels, F(1) variability, corner vowel space and mean distance between vowels. Overlap degree among vowels accounted for the greatest amount of variance in intelligibility scores. CONCLUSION: A speaker's overall intelligibility in dysarthric speech is better represented by the overlap degree among vowels than by the vowel space.


Subject(s)
Dysarthria/physiopathology , Phonetics , Speech Acoustics , Speech Intelligibility , Adolescent , Adult , Cerebral Palsy/complications , Cerebral Palsy/physiopathology , Dysarthria/etiology , Female , Humans , Male , Middle Aged , Sound Spectrography , Young Adult
16.
Clin Linguist Phon ; 24(10): 759-70, 2010 Oct.
Article in English | MEDLINE | ID: mdl-20831376

ABSTRACT

This paper analyses consonant articulation errors in dysarthric speech produced by seven American-English native speakers with cerebral palsy. Twenty-three consonant phonemes were transcribed with diacritics as necessary in order to represent non-phoneme misarticulations. Error frequencies were examined with respect to six variables: articulatory complexity, place of articulation, and manner of articulation of the target phoneme; and change in articulatory complexity, place, and manner resulting from the misarticulation. Results showed that target phonemes with high articulatory complexity were most often misarticulated, independent of intelligibility, but low-intelligibility speakers reduced the complexity of target consonants more frequently. All speakers tended to misarticulate to the adjacent place of the target place, but this pattern was most prominent for high-intelligibility speakers. Low- and mid-intelligibility speakers produced more manner errors than high-intelligibility speakers. Based on these results, a two-part model of consonant articulation errors is proposed for CP-associated spastic dysarthia.


Subject(s)
Cerebral Palsy/physiopathology , Dysarthria/physiopathology , Models, Biological , Phonetics , Speech Intelligibility/physiology , Adolescent , Adult , Cerebral Palsy/complications , Dysarthria/etiology , Female , Humans , Male , Middle Aged , Severity of Illness Index , Speech Production Measurement , Young Adult
17.
J Acoust Soc Am ; 123(4): 2043-53, 2008 Apr.
Article in English | MEDLINE | ID: mdl-18397012

ABSTRACT

In this paper, an acoustic model for the robustness analysis of optimal multipoint room equalization is proposed. The optimal multipoint equalization aims to have the optimal performance in a least-squares sense for all measured points. The model can be used for theoretical robustness estimation depending on the critical design parameters such as the number of measurement points, the distance between measurements, or the frequency before applying real equalization system. The analysis results show that it is important to set the appropriate number of measurement points and the distances between measurement points to ensure the enlarged equalization region at a specific frequency.


Subject(s)
Acoustics , Architecture , Models, Statistical , Humans , Psychoacoustics , Sound
18.
Neuroimage ; 39(3): 1333-44, 2008 Feb 01.
Article in English | MEDLINE | ID: mdl-18023366

ABSTRACT

Stuttering is a developmental speech disorder that occurs in 5% of children with spontaneous remission in approximately 70% of cases. Previous imaging studies in adults with persistent stuttering found left white matter deficiencies and reversed right-left asymmetries compared to fluent controls. We hypothesized that similar differences might be present indicating brain development differences in children at risk of stuttering. Optimized voxel-based morphometry compared gray matter volume (GMV) and diffusion tensor imaging measured fractional anisotropy (FA) in white matter tracts in 3 groups: children with persistent stuttering, children recovered from stuttering, and fluent peers. Both the persistent stuttering and recovered groups had reduced GMV from normal in speech-relevant regions: the left inferior frontal gyrus and bilateral temporal regions. Reduced FA was found in the left white matter tracts underlying the motor regions for face and larynx in the persistent stuttering group. Contrary to previous findings in adults who stutter, no increases were found in the right hemisphere speech regions in stuttering or recovered children and no differences in right-left asymmetries. Instead, a risk for childhood stuttering was associated with deficiencies in left gray matter volume while reduced white matter integrity in the left hemisphere speech system was associated with persistent stuttering. Anatomical increases in right hemisphere structures previously found in adults who stutter may have resulted from a lifetime of stuttering. These findings point to the importance of considering the role of neuroplasticity during development when studying persistent forms of developmental disorders in adults.


Subject(s)
Brain/pathology , Stuttering/pathology , Anisotropy , Brain Mapping , Child , Child, Preschool , Disease Progression , Functional Laterality/physiology , Humans , Language Tests , Linear Models , Magnetic Resonance Imaging , Male , Nerve Net/pathology , Stuttering/psychology
19.
J Acoust Soc Am ; 118(4): 2579-87, 2005 Oct.
Article in English | MEDLINE | ID: mdl-16266178

ABSTRACT

Acoustic cues related to the voice source, including harmonic structure and spectral tilt, were examined for relevance to prosodic boundary detection. The measurements considered here comprise five categories: duration, pitch, harmonic structure, spectral tilt, and amplitude. Distributions of the measurements and statistical analysis show that the measurements may be used to differentiate between prosodic categories. Detection experiments on the Boston University Radio Speech Corpus show equal error detection rates around 70% for accent and boundary detection, using only the acoustic measurements described, without any lexical or syntactic information. Further investigation of the detection results shows that duration and amplitude measurements, and, to a lesser degree, pitch measurements, are useful for detecting accents, while all voice source measurements except pitch measurements are useful for boundary detection.


Subject(s)
Cues , Speech Acoustics , Speech/physiology , Analysis of Variance , Humans , Speech Production Measurement
20.
Article in English | MEDLINE | ID: mdl-19212454

ABSTRACT

Three research prototype speech recognition systems are described, all of which use recently developed methods from artificial intelligence (specifically support vector machines, dynamic Bayesian networks, and maximum entropy classification) in order to implement, in the form of an automatic speech recognizer, current theories of human speech perception and phonology (specifically landmark-based speech perception, nonlinear phonology, and articulatory phonology). All three systems begin with a high-dimensional multiframe acoustic-to-distinctive feature transformation, implemented using support vector machines trained to detect and classify acoustic phonetic landmarks. Distinctive feature probabilities estimated by the support vector machines are then integrated using one of three pronunciation models: a dynamic programming algorithm that assumes canonical pronunciation of each word, a dynamic Bayesian network implementation of articulatory phonology, or a discriminative pronunciation model trained using the methods of maximum entropy classification. Log probability scores computed by these models are then combined, using log-linear combination, with other word scores available in the lattice output of a first-pass recognizer, and the resulting combination score is used to compute a second-pass speech recognition output.

SELECTION OF CITATIONS
SEARCH DETAIL
...